Adaptive Stream Fusion in Multistream Recognition of Speech

نویسندگان

Nima Mesgarani

Samuel Thomas

Hynek Hermansky

چکیده

A new method to deal with variable distortions of speech during the operation of the system is proposed. First, multiple processing streams are formed by extracting different spectral and temporal modulation components from the speech signal. Information in each stream is used to estimate posterior probabilities of phonemes. Initial values for a weighted integration of these individual estimates are found by normalized cross-correlation of the estimates with the actual phoneme labels on the training data. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the weights in the linear fusion are adapted using particle filtering to optimize the performance. Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Toward optimizing stream fusion in multistream recognition of speech.

A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the best fusion ...

متن کامل

Audio-Visual Speech Modeling for Continuous Speech Recognition

This paper describes a speech recognition system that uses both acoustic and visual speech information to improve the recognition performance in noisy environments. The system consists of three components: 1) a visual module; 2) an acoustic module; and 3) a sensor fusion module. The visual module locates and tracks the lip movements of a given speaker and extracts relevant speech features. This...

متن کامل

A Framework for Practical Multistream ASR

Robustness of automatic speech recognition (ASR) to acoustic mismatches can be improved by using multistream architecture. Past multistream approaches involve training large number of neural networks, one for each possible stream combination. During testing phase, each utterance is forward passed through all the neural networks to estimate best stream combination. In this work, we propose a new...

متن کامل

Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition

fMPE is a recently introduced discriminative training technique that uses the Minimum Phone Error (MPE) discriminative criterion to train a feature-level transformation. In this paper we investigate fMPE trained audio/visual features for multistream HMM-based audio-visual speech recognition. A flexible, layer-based implementation of fMPE allows us to combine the the visual information with the ...

متن کامل

A multistream multiresolution framework for phoneme recognition

Spectrotemporal representation of speech has already shown promising results in speech processing technologies, however, many inherent issues of such representation, such as high dimensionality have limited their use in speech and speaker recognition. Multistream framework fits very well to such representation where different regions can be separately mapped into posterior probabilities of clas...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Adaptive Stream Fusion in Multistream Recognition of Speech

نویسندگان

چکیده

منابع مشابه

Toward optimizing stream fusion in multistream recognition of speech.

Audio-Visual Speech Modeling for Continuous Speech Recognition

A Framework for Practical Multistream ASR

Discriminatively trained features using fMPE for multi-stream audio-visual speech recognition

A multistream multiresolution framework for phoneme recognition

عنوان ژورنال:

اشتراک گذاری